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Abstract 

We propose a method for predicting helical membrane protein structures by computer 
simulations. Our method consists of two parts. In the first part, amino- acid sequences 
of the transmembrane helix regions are obtained from one of existing WWW servers. 
In the second part, we perform a replica-exchange simulation of these transmembrane 
helices with some constraints and identify the predicted structure as the global-minimum- 
energy state. We have tested the method with the dimeric transmembrane domain of 
glycophorin A. The structure obtained from the prediction was in close agreement with 
the experimental data. 

1 Introduction 

It is one of the most important problems to predict protein tertiary structures from the 
amino-acid sequence information in the structural genomics era. It is estimated that 20- 
30 % of all genes in most genomes encode membrane proteins Ej. However, only a 
small number of detailed structures have been obtained for membrane proteins because 
of technical difficulties in experiments such as high quality crystal growth. Therefore, it 
is desirable to develop a method for predicting membrane protein structures by computer 
simulations (for previous attempts, see, for instance, Refs. [H| [E])- In particular, struc- 
tures of membrane proteins are simpler than those of soluble proteins and most membrane 
proteins are composed of several transmembrane helices (we do not consider /5-sheet mem- 
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brane proteins in the present work). Taking it into consideration, it is possible to search 
more limited conformational space than that of usual simulations. In other words, the 
prediction of membrane protein structures may be easier than that of soluble proteins 
because there is only one type of secondary structures. 

In the two-stage model jH] , individual helices of a membrane protein are postulated to 
be stable separately as domains in a lipid bilayer and then side-to-side helix association 
is driven, resulting in a functional protein. In fact, some experimental evidence indi- 
cates that the formation of a-helices and the positioning of transmembrane helices are 
independent [TU] ITT] . Separated fragments of bacteriorhodopsin formed independently 
a-helical conformations in the membrane, the native structure could be recovered mixing 
the fragments. Therefore it is reasonable to assume that processes of helix formation and 
positioning may be predicted separately. 

Our prediction method thus consists of two parts. In the first part, amino- acid se- 
quences of the transmembrane helix regions are obtained from database analyses PP |12j- 
[TH] . In the second part, we perform a replica-exchange simulation of these transmembrane 
helices with some constraints and identify the predicted structure as the global-minimum- 
energy state. However, it is difficult to obtain a global-minimum state in potential energy 
surface by conventional molecular dynamics (MD) or Monte Carlo (MC) simulations. This 
is because there exist a huge number of local-minimum-energy states, and the simulations 
tend to get trapped in one of the local-minimum states. One popular way to overcome 
this multiple-minima problem is to perform a generalized-ensemble simulation (for re- 
views, see Refs. [HI EDI), which is based on non-Boltzmann probability weight factors so 
that a random walk in potential energy space may be realized. The random walk allows 
the simulation to go over any energy barrier and sample much wider configurational space 
than by conventional methods. One of well-known generalized-ensemble algorithms is the 
replica-exchange method (REM) [2JJ [2H] (the method is also referred to as parallel tem- 
pering [21! )• We apply this method to the structure prediction of membrane proteins. 
The dimeric transmembrane domain of glycophorin A is often used as a model system of 
helix-helix interaction of membrane proteins E] • In this Letter, we test our prediction 
method with this system. 

Sec. 2 summarizes the details of our method for predicting transmembrane helix config- 
urations. In Sec. 3 the results of the application to the structure prediction of the dimeric 
transmembrane domain of glycophorin A are given. Section 4 is devoted to conclusions. 

2 Methods 

Our method consists of two parts. In the first part, amino- acid sequences of the trans- 
membrane helix regions of the target protein are identified. It is already established that 
the transmembrane helical segments can be predicted by analyzing mainly the hydropho- 
bicity of amino acid sequences, without having any information about the higher order 
structures. There exist many WWW servers such as TMHMM 0, MEMSAT [E|, SOSUI 
[T7j . and HMMTOP [IB] in which given the amino-acid sequence of a protein they judge 
whether the protein is a membrane protein or not and (if yes) predict the regions in the 
amino-acid sequence that correspond to the transmembrane helices. 

In the second part, we perform a REM simulation of these transmembrane helices that 
were identified in the first part. Given the amino-acid sequences of transmembrane helices, 
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we first construct ideal canonical a-helices (3.6 residues per turn) of these sequences. For 
our simulations, we introduce the following rather drastic approximations: (1) We treat 
the backbone of the a-helices as rigid body and only side-chain structures are made 
flexible. (2) We neglect the rest of the amino acids of the membrane protein (such as 
loop regions). (3) We neglect surrounding molecules such as lipids. In principle, we 
can also use molecular dynamics method, but we employ Monte Carlo algorithm here. 
We update configurations with rigid translations and rigid rotations of each a-helix and 
torsion rotations of side chains. We use a standard force field such as CHARMM [23 
Ed] for the potential energy of the system. We also add the following simple harmonic 
constraints to the original force-field energy: 
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where is the total number of transmembrane helices in the protein and 9{x) is the 
step function: 



9{x) 



1 for x > 0, 
otherwise, 



and ki, ki, and k% are the force constants of the harmonic constraints, r^j+i is the distance 
between the C atom of the C-terminus of the z-th helix and the N atom of the N-terminus 
of the (i + l)-th helix, z\ and zf are the z-coordinate values of the C a (or C) atom of 
the N-terminus (or C-terminus) of the i-ih helix near the fixed lower boundary value z\ 
and the upper boundary value Zq of the membrane, respectively, rc a are the distance of 
Cq, atoms from the origin, and c^+i, d\, d^ , and dc a are the corresponding central value 
constants of the harmonic constraints. 

The first term in Eq. (0) is the energy that constrains pairs of adjacent helices along 
the amino-acid chain not to be apart from each other too much (loop constraints). This 
term has a non-zero value only when the distance r^+i becomes longer than di t i+i. 

The second term in Eq. (£[} is the energy that constrains helix N-teminus and C- 
terminus to be located near membrane boundary planes. This term has a non-zero value 
only when the C atom of each helix C-terminus and C Q atom of each helix N-terminus 
are apart more than d\ (or df). Base on the knowledge that most membrane proteins 
are placed in parallel, this constraint energy is included so that helices are not too apart 
from the perpendicular orientation with respect to the membrane boundary planes. 

The third term in Eq. {1} is the energy that constrains all C a atoms within the sphere 
(centered at the origin) of radius dc a . This term has a non-zero value only when C a atoms 
go out of this sphere. This term is introduced so that the center of mass of the molecule 
stays near the origin. The radius of sphere is set to a large value in order to guarantee 
that a wide configurational space is sampled. 
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We now briefly review the replica-exchange method (REM) [21] [2S] (see Refs. [231201 
for details). The system for REM consists of M non-interacting copies (or, replicas) of the 
original system in the canonical ensemble at M different temperatures T m (m = 1, M). 
Let X = (..., xj^, ...) stand for a state in this ensemble. Here, the superscript i and 
the subscript m in x^] label the replica and the temperature, respectively. The state X 
is specified by M sets of x^j, which in turn is specified by the coordinates of all the 
atoms in replica i. A simulation of the REM is then realized by alternately performing the 
following two steps. Step 1: Each replica in canonical ensemble of the fixed temperature is 
simulated simultaneously and independently for a certain MC or MD steps. Step 2: A pair 
of replicas, say % and j, which are at neighboring temperatures T m and T n , respectively are 
exchanged: X — (..., x{^, x$, ...) — > X' = (..., xj$, x$, ...). The transition probability 
of this replica exchange is given by the Metropolis criterion: 

«,(* X') = = { Lp(-A) oThetlse ; < 3 > 

where 

A = (/3 m - (3 n ) (E(q^) - E{q®j) . (4) 

Here, E(q^) and .E(g^) are the potential energy of the i-th and the j-th replica, re- 
spectively. In the present work, we employ Monte Carlo algorithm for Step 1. There are 
2Nn + Nn kinds of Monte Carlo moves, where Nd is the total number of dihedral angles in 
the side chains of Nu helices. The first term corresponds to the rigid translation and rigid 
rotation of the helices and the second to the dihedral-angle rotations in the side chains. 
One MC sweep is defined to consist of 2A^h + Nd updates that are randomly chosen from 
these MC moves with the Metropolis evaluation for each update. 

We predict the native structure of membrane spanning regions from the global-minimum- 
energy state obtained by the REM simulations. 

3 Results and discussion 

In the first part of the present method, we obtain amino-acid sequences of the transmem- 
brane helix regions from existing WWW servers such as those in Refs. El EI] • 
However, the precision of these programs in the WWW servers is about 85 % and needs 
improvement. We thus focus our attention on the effectiveness of the second part of out 
method, leaving this improvement to the developers of the WWW servers. Namely, we 
use the experimentally known amino-acid sequence of helices and try to predict their con- 
formations, following the prescription of the second part of our method described in the 
previous section. We selected the amino-acid sequence of the transmembrane dimer of 
glycophorin A (PDB code: 1AFO). The number of amino acids for each helix is 18 and 
the sequence is TLIIFGVMAGVIGTILLI. 

At first, the ideal canonical a-helix (3.6 residues per turn) of this sequence was con- 
structed. The N and C termini of this helix were blocked with acetyl and N-methyl 
groups, respectively. The force field that we used is the CHARMM paraml9 parameter 
set (united-atom model) [23 EH] • No cutoff was introduced to the non-bonded energy 
terms, and the dielectric constant e was set equal to 1.0. We have also studied the case 
of e = 4.0, because it is the value close to that for the lipid environment. The computer 
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code based on the CHARMM macromolecular mechanics program j2Hj was used and the 
replica-exchange method was implemented in it. 

This helix structure was minimized subject to harmonic restraints on all the heavy 
atoms. The initial configuration for the REM simulation was that two a-helices of identical 
sequence and structure thus prepared were placed in parallel at a distance of 20 A. These 
helices are quite apart from each other and the starting configuration is indeed very 
different from the native one. Note that the only information derived from the NMR 
experiments j2H] is the amino-acid sequence of the individual helices. 

The values of the constants in the constraints in Eq. (£[]) were set as follows: Ah = 2, 
fa = k 2 = 0.5 kcal/(mol A 2 ), k 3 = 0.05 kcal/(mol A 2 ), d iji+1 = 20 A, z\ = -13.35 A, 
Zq = +13.35 A, d\ — df — 1.0 A, and dc a = 50 A. The values for z\ and Zq were taken 
from the z-coordinates of the initial configuration (in Fig. Efa) below; the z axis is placed 
horizontally in the figure). The values of d\ and df are set loosely in the sense that the 
constraint energy will not be turned on until the interhelix angle of two helices becomes 
more than about 45°. In the present example of glycophorin A dimer, the first term in 
Eq. Q were imposed on both terminal ends (i.e., two kinds of r^+i were prepared: one 
is the distance between a pair of N atoms at the N-terminus of the two helices and the 
other is the distance between a pair of C atoms at the C-terminus of the two helices). 

We performed two REM MC simulations of 1,000,000 MC sweeps, starting from this 
parallel configuration: one with the dielectric constant e = 1.0 and the other with e = 4.0. 
We used the following 13 temperatures: 200, 239, 286, 342, 404, 489, 585, 700, 853, 
1041, 1270, 1548, and 1888 K, which are distributed almost exponentially. The highest 
temperature was chosen sufficiently high so that no trapping in local-minimum-energy 
states occurs. This temperature distribution was chosen so that all the acceptance ratios 
are almost uniform and sufficiently large (> 10 %) for computational efficiency Backbone 
structures were fixed during simulations and Monte Carlo move type was taken to be rigid 
translation of each helix, rigid rotation of each helix, and torsion-angle rotations of side 
chains. 

We first discuss the results with e = 1.0. In Fig. ^the canonical probability distri- 
butions of the total potential energy obtained at the chosen 13 temperatures from the 
REM simulation are shown. We see that there are enough overlaps between all neigh- 
boring pairs of distributions, indicating that there will be sufficient numbers of replica 
exchange between pairs of replicas. The obtained acceptance ratio of the replica exchange 
are listed in Table H We see that the acceptance ratio of replica exchange between all 
pairs of neighboring temperatures is large enough as expected in Fig The results in 
Table pimply that one should observe a free random walk in the replica and temperature 
space. 

In Fig. |2fa) we show the "time series" of replica exchange at the lowest temperature 
(T = 200 K). We see that every replica takes the lowest temperature many times, and we 
indeed observe a random walk in the replica space. The complementary picture to this 
is the temperature exchange for each replica. The results for one of the replicas (Replica 
8) are shown in Fig. Efb). We again observe a random walk in the temperature space 
between the lowest and highest temperatures. Other replicas perform random walks in 
the same way. In Fig. 0(c) the corresponding time series of the total potential energy is 
shown. We see that a random walk in the potential energy space between low and high 
energies is also realized. Note that there is a strong correlation between the behaviors 
in Figs. E^b) and ^c) as there should. All these results confirm that the present REM 
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simulation has been properly performed. 

We now study how widely the configurational space is sampled during the present 
simulation. For this purpose, we plot the time series of the root-mean-square (RMS) 
deviation of the backbone atoms from the NMR structure [2^ in Fig. [2jd). When the 
temperature becomes high, the RMS deviation takes a large value (the largest value in 
Fig. Efd) is 13.9 A, and the maximum value among all the replicas is 15.7 A), and when 
the temperature becomes low, the RMS deviation takes a small value (the smallest value 
in Fig. IHd) is 0.48 A, and the minimum value among all the replicas is 0.47 A). By 
comparing Fig. Efc) and Fig. Efd), we see that there is a strong correlation between the 
total potential energy and the RMS deviation values. In particular, it is remarkable that 
when the energy is the lowest (around —1490 kcal/mol), most of the RMS values are as 
small as about 0.5 A. This implies that the global-minimum-energy state is indeed very 
close to the native structure. 

In Fig. El typical snapshots from the REM simulation of Fig. |2 are shown. Fig. Of a) is 
the initial configuration of this simulation, in which the two helices are in parallel. We see 
that this simulation sampled many non-native configurations such as those in Fig. Efb) 
and Fig. Ofc). At low temperatures low-energy configurations such as that in Fig. Efd) 
with the side chains packed are sampled. We see that the REM simulation performs a 
random walk not only in energy space but also in conformational space and that it does 
not get trapped in one of a huge number of local-minimum-energy states. 

In Fig. 13] the configuration obtained by the NMR experiments [29J and the global- 
minimum-energy configurations obtained by the REM simulations are compared. Here, 
we also show the result with the dielectric constant e = 4.0. At first sight, it is rather 
surprising that the result with e = 1.0 is much closer to the experimental result than that 
with e = 4.0, because the dielectric constant for a lipid system is closer to 4.0 than to 
1.0. However, on second thoughts we understand that the present results are reasonable 
because the pairs of helices in transmembrane proteins are tightly packed and almost no 
lipid molecules can exist between helices. This implies that helix-helix interactions are 
the main driving force in the final stage of the structure formation of membrane proteins. 

4 Conclusions 

In this Letter we proposed a method for predicting the membrane spanning structures of 
helical membrane proteins. Our method consists of two parts. In the first part, amino- 
acid sequences of the transmembrane helix regions of the target protein are obtained from 
one of existing WWW servers. The precision of these programs in the WWW servers is 
at present about 85 %, but it is expected to be further improved. In the second part of 
our method, we perform a generalized-ensemble simulation of these transmembrane helices 
with atomistic details to obtain the global-minimum (free) energy state, which we identify 
as the predicted structure. In order to save computation time, we introduced rather 
bold approximations: Backbones are treated as rigid body (only side-chain structures 
are made flexible) and the rest of the protein such as loop regions and the surrounding 
lipids are neglected. With these assumptions, however, the example that we tested gave a 
remarkable agreement of the predicted structure with the experimental data. We believe 
that the inclusion of atomistic details of side chains was particularly important because 
transmembrane helices are usually tightly packed. This fact also justifies the validity of our 
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assumptions in the sense that almost no lipid molecules can exist between helices. In the 
future we have to make our approximations better. For instance, we should introduce some 
flexibility in the helix backbone structures. The electrostatic interactions, in which we 
used the dielectric constant value of 1.0, can also be modified so that some environmental 
effects may be taken into account. 
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Table 1: Acceptance ratio of replica exchange corresponding to pairs of neighboring tem- 
peratures 



Pairs of temperatures Acceptance ratio 



200 4— 


■* 239 K 


0.41 


239 4— 


■» 286 K 


0.40 


286 4— 


■» 342 K 


0.39 


342 4— 


■* 404 K 


0.40 


404 4— 


4 489 K 


0.32 


489 4— 


■* 585 K 


0.34 


585 4— 


+ 700 K 


0.33 


700 4— 


■* 853 K 


0.28 


853 4— 


4 1041 K 


0.29 


1041 4- 


-> 1270 K 


0.36 


1270 4- 


-> 1548 K 


0.42 


1548 4- 


-> 1888 K 


0.46 



P(E) 
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0.03 
0.02 
0.01 
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Figure 1: The canonical probability distributions of the total potential energy obtained 
from the replica-exchange MC simulation at the thirteen temperatures. The distributions 
correspond to the following temperatures (from left to right): 200, 239, 286, 342, 404, 
489, 585, 700, 853, 1041, 1270, 1548, and 1888 K, 
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(a) (b) 




200000 400000 600000 800000 1000000 200000 400000 600000 800000 1000000 

MC sweep MC sweep 



Figure 2: Time series of replica exchange at T = 200 K (a), temperature exchange for 
one of the replicas (Replica 8) (b), the total potential energy for Replica 8 (c), and the 
RMS deviation (in A) of backbone atoms from the NMR structure for Replica 8 (d). 
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Figure 3: Typical snapshots from the REM simulation of Fig. 2. The initial configuration 
(a), the configurations at 294,931 MC sweep (b), at 732,000 MC sweep (c), and at 810,000 
MC sweep (d). 
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(a) 



(b) 



(c) 




Figure 4: (a) The NMR configuration (PDB code 1AF0, Model 16). (b) The global- 
minimum-energy configuration predicted by the REM simulation with the dielectric con- 
stant e = 1.0. (c) The global-minimum-energy configuration predicted by the REM simu- 
lation with e = 4.0. The RMS deviation from the native configuration of (a) is 0.59 A (b) 
and 4.48 A (c) with respect to all backbone atoms, and it is 1.31 A (b) and 5.55 A (c) 
with respect to all atoms. 
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